Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Allow superset to be deployed under a prefixed URL #30134

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

martyngigg
Copy link
Contributor

SUMMARY

Introduces changes in the front and backend to allow Superset to be deployed under a prefixed-URL path rather than the root of the domain. A new environment variable BASE_PATH has been introduced into webpack, alongside the existing ASSET_BASE_URL, to define the prefix.

The frontend has been updated to include the prefix in any resource links along with two new helper utilities, assetUrl & pathUtils, to help construct the paths. The SupersetClient has been update to remove the unused baseUrl field and rename it as basePath such that the object can be initialized with the base path and calling any endpoints through this class can use the same relative link as before. QUESTION: Have I broken an API here?

The backend has been refactored to avoid using hardcoded paths in redirects and instead uses Flask.url_for to construct reference. This requires that the proxy server set the X-Forwarded-Prefix header along with setting ENABLE_PROXY_FIX=True in superset_config.py

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@rusackas
Copy link
Member

rusackas commented Sep 5, 2024

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

@martyngigg
Copy link
Contributor Author

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

Yes agreed I definitely need to add some documentation for how to use this. I wanted to wait until I was more confident that the general approach was acceptable before working on describing how to use it. In terms of a review as part of #26319 is there anything I need to do at the moment?

);
this.baseUrl = url.href.replace(/\/+$/, ''); // always strip trailing slash
this.basePath = basePath.replace(/\/+$/, ''); // always strip trailing slash

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of '/'.
This
regular expression
that depends on
library input
may run slow on strings with many repetitions of '/'.
@mistercrunch
Copy link
Member

mistercrunch commented Sep 19, 2024

I think this may be quite an undertaking and might have to be tackled in phases. Would love to see this done :)

I remember giving it a shot early on in the project at least once, and never wrapped up the work. My main advice would be to avoid taking in side mission or bigger refactor while tackling this.

@@ -17,19 +17,23 @@
#
set -e

# zstd exe is required by simple-zstd package
apt-get update
apt-get install -y zstd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a recent fix in master for this

@ravikantkml
Copy link

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

Yes agreed I definitely need to add some documentation for how to use this. I wanted to wait until I was more confident that the general approach was acceptable before working on describing how to use it. In terms of a review as part of #26319 is there anything I need to do at the moment?

Thanks for these changes.
Can you please mention values for BASE_PATH and ASSET_BASE_URL. Just wanted to check these changes with superset and nginx.

@kasper-rutten
Copy link

For what it's worth, we've been running this fork to serve superset on a path prefix in a Kubernetes(K3S) + traefik context without any problems so far. Though testing has been limited.

Thanks for the work so far

When ENABLE_PROXY_FIX is configured url_for will ensure any
base prefix is correctly dealt with.
Allows an application prefix to be defined separately to the
URL prefix used for the static assets. Webpack looks for a BASE_PATH
environment variable and injects this into the built assets.
@martyngigg martyngigg force-pushed the allow_prefixed_paths branch from 93d66c7 to 09a05b3 Compare October 15, 2024 14:23
@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Oct 15, 2024
@martyngigg
Copy link
Contributor Author

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

Yes agreed I definitely need to add some documentation for how to use this. I wanted to wait until I was more confident that the general approach was acceptable before working on describing how to use it. In terms of a review as part of #26319 is there anything I need to do at the moment?

Thanks for these changes. Can you please mention values for BASE_PATH and ASSET_BASE_URL. Just wanted to check these changes with superset and nginx.

Sorry for the delay in replying to this. I've had zero time to spend on this lately.

The latest commits rebase on the current master and I have also added some documentation to the configuring superset pages that describe the new variables with an example nginx.conf. Does that help?

@martyngigg
Copy link
Contributor Author

@rusackas I think this is ready for review but in terms of it being related to SIP-112 is that the right thing to do in terms of the process?

@rusackas
Copy link
Member

If you think it's ready for review, I'd say you can go ahead and move it from Draft to "Ready for Review" - though it looks like CI might need a little more love still.

As for the SIP, since @sfirke reopened the GitHub issue, I set the SIP itself back to "pre-discussion" state for its official consensus. Did you want to re-kindle that pre-existing SIP, or would it be easier to open a new one? I'm happy to help you carry that SIP through to fruition if you'd like. Feel free to DM me on slack if it's easier.

@ascendlin
Copy link

When clicking on the user list or role list, and then clicking on datasets, a 404 error appears.

@ravikantkml
Copy link

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

Yes agreed I definitely need to add some documentation for how to use this. I wanted to wait until I was more confident that the general approach was acceptable before working on describing how to use it. In terms of a review as part of #26319 is there anything I need to do at the moment?

Thanks for these changes. Can you please mention values for BASE_PATH and ASSET_BASE_URL. Just wanted to check these changes with superset and nginx.

Sorry for the delay in replying to this. I've had zero time to spend on this lately.

The latest commits rebase on the current master and I have also added some documentation to the configuring superset pages that describe the new variables with an example nginx.conf. Does that help?

Thanks a lot for the response.
We tried the changes as recommended and observed that most of the things are working when we specify prefix. However there are few things where prefix is not applied correctly as shown in the image below (chart menu has prefix applied correctly)
image

Also dashboard menu items export to PDF/image not working (Dashboard -> Download -> Export to PDF/Download as Image)

@ravikantkml
Copy link

Thanks for tackling this... it's a popular request, no doubt about it. If you think it's relevant, we might also want to add an entry in the documentation somewhere.

Yes agreed I definitely need to add some documentation for how to use this. I wanted to wait until I was more confident that the general approach was acceptable before working on describing how to use it. In terms of a review as part of #26319 is there anything I need to do at the moment?

Thanks for these changes. Can you please mention values for BASE_PATH and ASSET_BASE_URL. Just wanted to check these changes with superset and nginx.

Sorry for the delay in replying to this. I've had zero time to spend on this lately.
The latest commits rebase on the current master and I have also added some documentation to the configuring superset pages that describe the new variables with an example nginx.conf. Does that help?

Thanks a lot for the response. We tried the changes as recommended and observed that most of the things are working when we specify prefix. However there are few things where prefix is not applied correctly as shown in the image below (chart menu has prefix applied correctly) image

Also dashboard menu items export to PDF/image not working (Dashboard -> Download -> Export to PDF/Download as Image)

I tried to fix above issue of missing prefixes in top-level menu. Please check this PR 2

@Dinesh-4320
Copy link

The solution really works. Solved a long term issue in our deployment behind proxies. Hoping to get merged soon with official code base

@harri2012
Copy link

Very good, looking forward to merging the code soon

@watercraft
Copy link

For whom it may concern, I'm testing with a port of this work to 3.1.1rc1 available from watercraft:superset branch prefix

@jeanpommier
Copy link

Hi,
Good job there, thank you very much for tackling this. I have just discovered that Superset did not support that and was feeling disappointed.

I can see this PR is still flagged as DRAFT. What's left to do to get it "Ready for Review" ?

@martyngigg do you need help in completing this PR ? I'm quite a superset newbie, but I'm willing to provide some help if necessary to get this merged.

@martyngigg
Copy link
Contributor Author

Dear all,

Thank you so much for taking the time to test out these changes and I apologise (again!) for the lack of response to any comments. @ravikantkml Thanks for the PR fixing the menu links - I think that looks okay and I'll merge it.

I think the remaining issues are fixing the issues with screenshot generation. I had a brief look at this today and I'm having trouble getting them working even without any prefixed path set - I must have an issue in my development setup.

Today is my last day of work before Christmas break but I will have time when I return to pick this up properly. I'll aim to get it to "Ready for Review" as soon as possible after the holidays. @jeanpommier Thanks for the offer of help too - If you wanted to try and tackle the screenshot generation problem then that would be super helpful but equally I'm happy to look at it after the break.

Happy Holidays! 🎅🏻

@landryb
Copy link

landryb commented Dec 30, 2024

testing the PR here outside of the docker context (eg pip install from that github branch branch, rebuild the assets with BASE_PATH set), that work is much welcome.

some notes so far.. i think it should be mentioned that APP_ICON and FAVICON needs BASE_PATH prepended, otherwise it generates 404's here with the default value from config.py

looking a bit more at the code/PR, all assets below STATIC_ASSETS_PREFIX also need to default to BASE_PATH ? that fixes the FAVICON while here.

with this config the superset homepage loads fine with all its assets/images/favicon:

BASE_PATH = "/dashboards"
APP_ICON = BASE_PATH + "/static/assets/images/superset-logo-horiz.png"
STATIC_ASSETS_PREFIX = BASE_PATH         

edit oh, i was pretty sure i saw those bits somewhere.. they are in the docker part of the PR in https://github.com/apache/superset/pull/30134/files#diff-415de08cbf9e8633d6cf2c439df7aa08bf9ebce684833ee837d6ce7d7ab7d53b. should those bits be documented, or copied over to the default superset config.py ?

@martyngigg
Copy link
Contributor Author

Happy New Year!

I've investigated further and found the issue with "Export to PDF" & "Download image" not working on dashboards. The issue seems to be the interaction of the headless browser and WEBDRIVER_BASEURL. In the docker compose setup superset_config.py suggests setting WEBDRIVER_BASEURL=http://superset_app:8088/, so accessing the superset instance directly and skipping the nginx reverse proxy. This is fine when there is no url prefix set as the paths that both the reverse proxy and the superset app see are the same. When there is a url prefix set it is baked into the javascript files that webpack builds and when accessing the superset app directly the extra prefix part of the path raises an error in the superset app and the screenshot generation fails.

A fix/workaround (??) seems to be to set WEBDRIVER_BASEURL = f"http://superset_nginx{base_path}/" but this also requires removing this with statement that seems to set a test context that doesn't know about the prefix. I'm not sure of the knock on effects of this. @rusackas Would you know someone who could comment on this or possibly suggest an alternative approach?

@jeanpommier
Copy link

Hello ! Happy new year !

This is fine when there is no url prefix set as the paths that both the reverse proxy and the superset app see are the same. When there is a url prefix set it is baked into the javascript files that webpack builds and when accessing the superset app directly the extra prefix part of the path raises an error in the superset app and the screenshot generation fails.

Well, I might have a solution for this one. I didn't get the time to look at the image generation task you proposed, but I was rather investigating around the path prefix topic. My concern was to avoid relying on a reverse-proxy for the path-prefix thing. IMO, it makes things complicated. For instance the issue you met above: dev setup (without docker compo) and prod-like setup with a reverse proxy works differently, with a different frontend where the path is hard-coded.
Looking on the backend side of it, my first idea was that we maybe didn't use the Flask blueprints to their full extent, but it turns out this would not be that easy to adapt to our needs. And maybe not a clean solution.

But, I stumbled across this blog post where I discovered that WSGI protocol supports a parameter that sets such a path prefix, and that is handled on the WSGI side of it (e.g. gunicorn). I tested it (on top of your PR since we need the url_for links) and it works like a charm.

What does it change ?

Supposing you choose analytics as the path prefix, the backend is accessed on http://superset_app:8088/analytics/. and the reverse-proxy has a much simpler job of just routing /analytics to http://superset_app:8088/analytics/.
On your above-mentioned issue, this means that the paths will concord between the front and back end, so it should work without even needing the WEBDRIVER_BASEURL fix you propose.

I'm still quite uncomfortable at the necessity to build the frontend to support a path other than root. I haven't made experiments yet on this, but wouldn't it be conceivable that the frontend looks for a config instruction from the backend to get this information ?

@martyngigg
Copy link
Contributor Author

Hello ! Happy new year !

This is fine when there is no url prefix set as the paths that both the reverse proxy and the superset app see are the same. When there is a url prefix set it is baked into the javascript files that webpack builds and when accessing the superset app directly the extra prefix part of the path raises an error in the superset app and the screenshot generation fails.

Well, I might have a solution for this one. I didn't get the time to look at the image generation task you proposed, but I was rather investigating around the path prefix topic. My concern was to avoid relying on a reverse-proxy for the path-prefix thing. IMO, it makes things complicated. For instance the issue you met above: dev setup (without docker compo) and prod-like setup with a reverse proxy works differently, with a different frontend where the path is hard-coded. Looking on the backend side of it, my first idea was that we maybe didn't use the Flask blueprints to their full extent, but it turns out this would not be that easy to adapt to our needs. And maybe not a clean solution.

But, I stumbled across this blog post where I discovered that WSGI protocol supports a parameter that sets such a path prefix, and that is handled on the WSGI side of it (e.g. gunicorn). I tested it (on top of your PR since we need the url_for links) and it works like a charm.

What does it change ?

Supposing you choose analytics as the path prefix, the backend is accessed on http://superset_app:8088/analytics/. and the reverse-proxy has a much simpler job of just routing /analytics to http://superset_app:8088/analytics/. On your above-mentioned issue, this means that the paths will concord between the front and back end, so it should work without even needing the WEBDRIVER_BASEURL fix you propose.

@jeanpommier Thanks for exploring and trying out the idea with having the backend also run on the prefix. I remember trying something like this a while ago and for some reason it didn't work properl. I think I may have been a bit hasty in dropping the idea and it was likely my lack of understanding of Superset that was the problem. I certainly agree relying on the proxy for the prefix handling is complicated. I'll try again since it certainly seems to just require an environment variable :)

I'm still quite uncomfortable at the necessity to build the frontend to support a path other than root. I haven't made experiments yet on this, but wouldn't it be conceivable that the frontend looks for a config instruction from the backend to get this information ?

Yes this is also annoying. Now I'm a bit more familiar with the code base I wondered about including it in this

bootstrap_data = {
config dictionary that is rendered into the templates? Anyone know if this is a good/bad idea?

@wuqicyber
Copy link

wuqicyber commented Jan 7, 2025

Happy New Year!  新年快乐!

I've investigated further and found the issue with "Export to PDF" & "Download image" not working on dashboards. The issue seems to be the interaction of the headless browser and WEBDRIVER_BASEURL. In the docker compose setup superset_config.py suggests setting WEBDRIVER_BASEURL=http://superset_app:8088/, so accessing the superset instance directly and skipping the nginx reverse proxy. This is fine when there is no url prefix set as the paths that both the reverse proxy and the superset app see are the same. When there is a url prefix set it is baked into the javascript files that webpack builds and when accessing the superset app directly the extra prefix part of the path raises an error in the superset app and the screenshot generation fails.我已经进一步调查并发现“导出为 PDF”和“下载图片”在仪表板上无法工作的问题。问题似乎是 headless 浏览器和 WEBDRIVER_BASEURL 之间的交互。在 docker compose 设置中, superset_config.py 建议设置 WEBDRIVER_BASEURL=http://superset_app:8088/ ,因此直接访问 superset 实例并跳过 nginx 反向代理。如果没有设置 URL 前缀,这是可以的,因为反向代理和 superset 应用看到的路径是相同的。当设置了 URL 前缀时,它会被烘焙到 webpack 构建的 javascript 文件中,并且当直接访问 superset 应用时,路径的额外前缀部分会在 superset 应用中引发错误,截图生成失败。

A fix/workaround (??) seems to be to set WEBDRIVER_BASEURL = f"http://superset_nginx{base_path}/" but this also requires removing this with statement that seems to set a test context that doesn't know about the prefix. I'm not sure of the knock on effects of this. @rusackas Would you know someone who could comment on this or possibly suggest an alternative approach?一个修复/解决方案似乎是设置 WEBDRIVER_BASEURL = f"http://superset_nginx{base_path}/" ,但这也需要删除这个 with 语句,该语句似乎设置了一个不知道前缀的测试上下文。我不确定这会有什么副作用。您认识可以对此发表评论或可能提出替代方法的人吗?

@martyngigg try APPLICATION_ROOT, by setting it to a specific perfix, i found it works

@jeanpommier
Copy link

I'll try again since it certainly seems to just require an environment variable :)
Hi @martyngigg

The trick is, you still need to set the BASE_PATH env var for the frontend (needs for now to be built with the prefix), but not for the superset backend, since this is the WSGI server that handles the prefix.

Here is a patch that should work, to run it with the gunicorn server. I've recently rebased your PR against master, so you might get a bit of noise in the docker-compo: gunicorn_script_name.patch.gz

And a similar patch to run the same principle with the flask dev server. I still haven't looked at a common way to get this to work, but it shouldn't be too complicated (I guess there is some flag somewhere indicating if we run the dev server):
flask_devserver_script_name.patch.gz

@jeanpommier
Copy link

Yes this is also annoying. Now I'm a bit more familiar with the code base I wondered about including it in superset/views/base.py#L333 bootstrap_data config dictionary that is rendered into the templates? Anyone know if this is a good/bad idea?

Hi @martyngigg from me point of view, it looks like the perfect place, but I'm no Superset developer.

I've been playing around quite a bit on this topic, but am still struggling about dynamic configuration of the path prefix for the frontend (i.e. without dedicated build). Did you make progress on your side ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API dependencies:npm doc Namespace | Anything related to documentation packages review:draft size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.